We  present  a  modification  of  the  Fast  Multipole  Method  (FMM)  in  two  dimensions.  While 
previous  implementations  of  the  FMM  have  been  designed  for  harmonic  kernels,  our  algo¬ 
rithm  works  for  a  large  class  of  kernels  that  satisfy  fairly  general  conditions,  amounting  to 
the  kernel  being  sufficiently  smooth  away  from  the  diagonal.  Our  algorithm  approximates 
appropriately  chosen  parts  of  the  kernel  with  “tensor  products”  of  Legendre  expansions  and 
uses  the  Singular  Value  Decomposition  (SVD)  to  compress  the  resulting  representations. 
The  obtained  singular  function  expansions  replace  the  Taylor  and  Laurent  expansions  used 
in  the  original  FMM.  The  algorithm  requires  O(N)  operations,  and  is  stable  and  robust. The 
performance  of  the  algorithm  is  illustrated  with  numerical  examples. 
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1  Introduction 


In  this  paper,  we  describe  a  fast  algorithm  for  the  evaluation  of  all  pairwise  interactions  in 
large  ensembles  of  particles  in  the  plane,  i.e.,  sums  of  the  form 

N 

u(xi)  =  'HT/qjK{xi,xj),  (1) 

j= i 

where  qi,.-.,qw  are  arbitrary  complex  numbers,  x\,...,xn  are  points  in  the  plane,  and 
K  :  R2  -»  R2  is  a  non-oscillatory  kernel.  Such  computations  appear  in  a  variety  of  numerical 
methods  for  the  solution  of  problems  of  computational  physics. 

The  algorithm  of  this  paper  is  a  version  of  the  Fast  Multipole  Method  (FMM)  in  two 
dimensions.  The  structure  of  the  FMM  algorithm  is  left  virtually  unchanged  from  the  one 
described  by  in  [3].  The  version  of  the  FMM  algorithm  used  in  this  paper,  however,  replaces 
the  Taylor  and  Laurent  expansions  with  “tensor  products”  of  Legendre  expansions  that  are 
subsequently  compressed  via  the  Singular  Value  Decomposition  (SVD).  This  approach  leads 
to  an  algorithm  that  can  be  applied  to  a  variety  of  non-oscillatory  kernels  that  are  sufficiently 
smooth  away  from  the  diagonal. 

In  two  dimensions,  the  original  Fast  Multipole  Method  (FMM)  relies  on  the  Taylor  and 
Laurent  expansions  (see  [14],  [7])  for  the  evaluation  of  Coulomb  interactions  in  large  ensem¬ 
bles  of  particles.  During  the  last  decade,  several  improvements  of  the  original  scheme  have 
been  suggested.  A  new  version  of  the  FMM,  based  on  specially  designed  singular  function 
expansions,  was  introduced  in  [10].  The  approach  taken  in  the  latter  paper,  when  used  in 
combination  with  an  intermediate  representation  consisting  of  complex  exponentials,  leads 
to  an  algorithm  that  is  about  five  times  as  fast  as  the  original  FMM,  due  to  the  reduction 
of  the  number  of  parameters  needed  to  represent  far  and  near  fields.  A  similar  technique 
was  used  in  one  dimension  in  [18].  A  version  of  the  FMM  for  polynomial  interpolation 
(see  [5])  uses  Chebyshev  expansions  that  are  compressed  by  a  suitable  change  of  basis  ob¬ 
tained  via  Singular  Value  Decomposition  (SVD).  Finally,  an  analytical  apparatus  based  on 
least  squares  approximation  of  integral  operators  was  developed  in  [17].  This  analytical 
apparatus  leads  to  fast  algorithms  for  a  fairly  large  class  of  kernels  in  one  dimension. 

The  plan  of  the  paper  is  as  follows.  In  Section  2,  we  introduce  mathematical  and 
numerical  preliminaries.  In  Sections  3  and  4,  we  describe  a  generalized  Fast  Multipole 
Method  in  two  dimensions  and  present  the  complexity  analysis.  Finally,  in  Section  5,  we 
demonstrate  the  performance  of  the  algorithm  with  several  numerical  examples. 

2  Mathematical  Preliminaries 

2.1  Gaussian  Integration  and  Interpolation 

In  what  follows,  we  will  denote  by  P“,b  the  n-th  Legendre  polynomial  on  the  interval  [a,  6]  C 
R.  We  will  refer  to  the  roots  . . .  ,x%b  of  P“,6(z)  as  the  Gaussian  nodes  of  order  n  and 
will  denote  by  io“’6, . . . ,  to“>6  the  weights  of  the  corresponding  Gaussian  quadrature  on  the 
interval  [a,  6].  We  will  denote  by  Ln  the  projection  from  the  space  of  continuous  functions 
on  the  interval  [a,  6]  to  the  space  of  polynomials  of  order  n,  preserving  the  function  values  at 
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the  Gaussian  nodes.  For  a  given  continuous  function  /  :  [a,  6]  — »  C,  the  function  Lnf(x)  is 
the  polynomial  of  order  n  such  that  Lnf(x^b)  =  f(x%£).  As  is  well  known,  for  all  a;  €  [a,  6], 


Lnf(x)  =  Y/ak-P^b(x),  (2) 

fc=0 

and  the  coefficients  a*  axe  given  by  the  formula 

ak=j2waJ-f(xaJ).Pk(xaJ).  (3) 

771=1 

The  polynomial  Lnf  will  be  referred  to  as  n-th  order  Legendre  expansion  of  the  function 
/.  For  any  integer  n  we  will  denote  by  ||Ln||oo  the  L°°-norm  of  the  operator  Ln,  defined  by 
the  formula 

Hindoo  =  Sup  ||in/||L«[a,6]-  (4) 

ll/llL°°[a.6]=l 

We  will  denote  by  ai(ar), . . . ,  an(x)  the  set  of  polynomials  of  order  n  defined  by  the 
formulae 

<Xi(x)=  JJ  ^ — ~,  i  =  1,2, . . .  ,n,  (5) 

where  x\, . . . ,  xk  are  the  Gaussian  nodes  of  order  n  on  the  interval  [a,  6].  It  is  readily  seen 
from  (5)  that  for  any  continuous  function  /  :  [a,  b]  -¥  C, 

Lnf{x)  =  ‘  pk(x )  =  2  (x»)  •  “»(*)•  (6) 

k=0  i=l 

For  any  natural  n  and  continuous  function  /  :  [a,  b]  -*  C,  we  will  denote  by  Enf  the 
error  of  the  best  approximation  to  /  among  all  polynomials  of  order  n,  i.e., 

Enf  =  nun  ||/  -  P||L~[a,6]-  (7) 

Let  p  >  0  be  an  arbitrary  positive  real  number.  For  any  analytic  function  f  :  C  C,  we 
will  denote  by  M([a,  b],  f,  p)  the  maximum  of  the  absolute  value  of  /  in  the  p-neighborhood 
of  the  interval  [a,  6],  i.e., 


M([a,b],f,p)  =  sup  sup  \f{x  +  peie)\.  (8) 

z€[a,6]  0€[-5r,ir] 

The  following  five  lemmas  are  well  known.  Their  proofs  can  be  found,  for  example,  in 

[16],  [12]. 

Lemma  2.1.  If  n  >  0  is  an  integer,  and  P  :  C  — >  C  is  a  polynomial  of  order  n,  then  for 
any  interval  [a,  b]  C  R, 

yj=^ll^llL2[a,6]  <  ll-P|lL“[a,6]  <  yj=^l|P||i2[a,6]-  (9) 
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(10) 


Lemma  2.2.  For  any  continuous  function  f  :  [a,  6]  — >•  C, 

11/  ~  Lnf\\L°°[a,b]  <  (1  +  ||Ln||oo)  '  ||/  —  -Sn/||z,°°[a>6]- 

Lemma  2.3.  For  any  n  times  continuously  differentiable  function  f  :  [a,  6]  — >  C, 

11/  -  S,/lk~M  <  (11) 

Lemma  2.4.  If  f  :  C  — >•  C  is  an  analytic  function,  then  for  any  positive  real  p  >  0, 

ll/WII^M]<n!-M([°'^’/’rt.  (12) 

Lemma  2.5.  For  any  natural  n, 

I  l-L'n  |  |oo  <  n-  (13) 

By  combining  (9),  (10),  (11),  (12),  and  (13),  we  obtain  the  following  theorem  describing 
the  rate  of  convergence  of  Legendre  expansions  of  an  analytic  function  on  the  interval  [a,  6]. 

Lemma  2.6.  Suppose  that  f  :  C  C  is  an  analytic  function,  and  that  for  some  positive 
p>  (b-  a)/ 4, 

M([a,b],f,p)  <  oo.  (14) 


Then 


(15) 


(16) 


Jim  ||/-Ln/||L«M]  =  0. 

Furthermore,  for  any  n  >  1, 

11/  -  ^/!Il~[o,6]  <  2(1  +  n)  ■  M([a,b],f, p)  •  (j^f  ■ 

A  standard  approach  to  the  construction  of  polynomial  approximations  of  functions  in 
higher  dimensions  is  to  expand  them  into  “tensor  products”  of  one-dimensional  Legendre 
polynomials.  For  an  m-dimensional  cube  Q  =  [fli ,  &i]  x . . .  x  [am,  bm ]  and  continuous  function 
/  :  Q  ->  C,  we  will  denote  by  Lnf  the  (unique)  polynomial  of  m  variables  having  the  form 

Lnf{Xl,  ■  •  •  ,*rn)  =  E  •  •  •  E  ° *1 •  Pkl,bl^l)  ■■■■■  P£’bm(Xm),  (17) 

fcl=0  km  =0 

and  coinciding  with  /  on  the  nm  “tensor  product”  Gaussian  nodes 

k\  =  1, . . . , n; . . . ; fcm  =  1, . . . , n;  (18) 

the  coefficients  are  given  by  the  formula 

. *.=  E  E  . 

h\ — 0  km — 0 

(19) 
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In  a  mild  abuse  of  terminology,  we  will  be  referring  to  such  polynomials  as  polynomials  of 
order  n  in  Rm  and  to  expansions  of  the  form  (17)  as  Legendre  expansions  of  order  n  in  the 
cube  Q  G  Rm.  For  an  analytic  function  /  :  Cm  — >  C,  we  will  denote  by  M(Q,f,p)  the 
maximum  of  the  absolute  value  of  /  in  the  p-neighborhood  of  the  cube  Q,  i.e., 

M(Q,f,p)  =  max  sup  sup  |/R, . . .  ,xk  +  peld, . . .  ,xm)|.  (20) 

The  following  two  lemmas  are  a  simple  consequence  of  Lemmas  2.1  and  2.6;  they  can 
be  viewed  as  multidimensional  analogues  of  the  latter  (see  for  example  [17]). 

Lemma  2.7.  If  n  >  0  is  an  integer  and  P  :  Cm  C  is  a  polynomial  of  order  n,  then  for 


any  cube  Q  =  [o,  b]m  C  Rm, 

S  nh-<e>  £  (21) 

Lemma  2.8.  Suppose  that  f  :  Cm  — >  C  is  an  analytic  function  on  Cm,  and  that  for  some 
positive  p  >  (b  —  a)  /4, 

M([a,b]m,f,p)  <  oo.  (22) 

Then,  for  any  n  >  1, 

<2(l+nr-M([o,i>r./,rt-  (^)”'  (23) 

2.2  Singular  Value  Decomposition  of  Integral  Operators 

Let  T  :  L2(Y)  — >  L2(X)  be  integral  operator  given  by  the  formula 

(! T-f)(x)=JYK(x,y)f(y)dy ,  (24) 

where  if  is  a  square  integrable  function  on  X  x  Y,  i.e., 

\\K(x,y)\\mxxY)  =  (f  fXxY  \K(x,y)\2dxdy^j  < +oo.  (25) 


The  function  K  :  X  xY  R  is  usually  referred  to  as  the  kernel  of  the  integral  operator  T. 
The  following  theorem  can  be  found  (in  a  more  general  form)  in  [15]. 

Theorem  2.9.  For  any  K  G  L2(X  x  Y),  there  exist  two  orthonormal  systems  of  functions 
R}  €  L2(X),  R}  G  L2(Y),  and  a  sequence  of  nonnegative  numbers  Si  >  S2  >  . . .  >  0, 
for  k  —  1,2,...,  such  that 

oo 

K{x,  y)  =  uk  {. x )  skvk  (y) ,  (26) 

k= 1 

in  L2(X  x  Y)  sense, 

OO 

H  |sfc|2  <  +CX),  (27) 

fc=i 

and  the  sequence  {s*,}  is  uniquely  determined  by  K. 
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Formula  (26)  is  normally  referred  to  as  the  singular  value  decomposition  (SVD)  of  the 
operator  T  (or  the  kernel  K).  The  functions  uk  and  Vk  are  usually  referred  to  as  the  left 
and  the  right  singular  functions,  respectively,  and  the  numbers  Sk  are  referred  to  as  singular 
values  of  the  operator  K  (or  the  kernel  K). 

The  singular  value  decomposition  can  be  used  to  construct  finite-dimensional  approx¬ 
imations  to  the  operators  of  the  form  (24)  and  the  corresponding  kernels  K.  Specifically, 
given  a  positive  real  e  >  0,  one  can  truncate  the  expression  (26)  after  a  finite  number  p  of 
terms,  leading  to  the  expression 

v 

K(x,y)  «  ^2uk(x)skVk{y).  (28) 

k= 1 

Now,  if  p  has  been  chosen  in  such  a  manner  that 


\ 


£  sl^e< 


k=p+ 1 


(29) 


then  due  to  (26), 

v 

II K(x,y)  -  53u*(z)s*ufc(y)lli2(xxy)  ^  e-  (30) 

jfc=i 

Theorem  2.10  (Minimal  property  of  the  SVD).  Suppose  that  the  SVD  of  the  opera¬ 
tor  T  :  L2(Y)  — >  L2(X)  with  the  kernel  K  :  X  xY  R  is  given  by  the  formula 

00 

K(x,y)  =  ^2uk{x)skvk{y).  (31) 

k=l 


Then  for  any  f  6  L2(Y), 

p 

||(T  •  f)(x)  -  2  uk(x)skbk ||l*(X)  <  Sp+i\\f\\LHY),  (32) 

k-l 

where  the  coefficients  bk  are  given  by  the  formula 

h=  JYf{y)vk{y)dy.  (33) 

2.3  Approximation  of  the  SVD  of  Integrals  Operators 

The  following  theorem  is  a  straightforward  generalization  of  Theorem  2.10. 

Theorem  2.11  (Approximation  of  the  SVD).  Suppose  that  the  operator  T  :  L2(Y)  — >• 
L2(X)  is  defined  by  (24),  that  there  exist  a  positive  number  6  >  0  and  a  square  integrable 
function  K  :  X  xY  -»  R  such  that 

\\K(x,y)-K{x,y)\\L2{XxY)<6,  (34) 
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and  that  the  SVD  of  K  is  given  by  the  formula 

oo 

K(x,y)  =  ^2uk{x)skVk{y)-  (35) 

k=  1 

Then  for  any  f  €  L2(Y), 

p 

||(T  •/)(£)  -  ^&fc(z)SA:i>fc||z,2pQ  -  +  Sp+l)||/||i2(y),  (36) 

fc=i 

where  the  coefficients  are  given  by  the  formula 

h=  [  f{y)vk{y)dy.  (37) 

J  Y 

Proof.  Obviously,  (34)  implies 

II  [  K(x,y)f(y)dy-  [  K(x,y)f{y)dy\\L2{x)  <  $||/||La(y),  (38) 

J  Y  Ji 

and  from  Theorem  2.10,  we  obtain 

r  p 

II  /  K{x,y)f(y)dy  -  £  uk(x)hh\\mx)  <  ®p+i||/||L»(y)-  (39) 

JY  k= 1 

Now,  (36)  follows  immediately  from  (38),  (39),  and  the  triangle  inequality.  □ 


3  Analytical  Apparatus 

In  the  remainder  of  this  paper,  we  will  be  assuming  that  all  charges  are  located  in  a  unit 
square  [0, 1]  x  [0, 1]  in  R2. 


3.1  Notation 

We  will  denote  by  Y^l,kl,k 2)  the  square 


y(i,fcl,fc2) 


k\  —  1  ki 

/?2  — •  1  &2 

2l  ’  2l\ 

X 

2l  ’  2l  . 

(40) 


where  l  >  1,  fci  =  1, . . . ,  2l,  =  1, . . . ,  2*;  l  will  be  referred  to  as  the  level  of  the  square 

fc2)  will  be  referred  to  as  the  coordinates  of  the  square  Y^l,k  1,k2\  We  will 
denote  by  Z^l,kuk2^  the  union  of  the  square  Y^l’kl’k^  and  its  immediate  neighbors  on  the 
level  l.  We  will  denote  the  subset  X^l,kl,k2^  of  [0, 1]  x  [0, 1]  by  the  formula 


xVMte)  _  [0>  j]  x  [o,  l]  \  z^'k2\  (41) 

and  refer  to  X^l,kl,k2^  as  the  interaction  domain  of  the  square  Y^kl'k2>.  In  other  words, 
the  interaction  domain  of  the  square  Y^k  1’k2'>  consists  of  all  squares  on  level  l  that  are 
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not  immediate  neighbors  of  y0>fci>fc2)  and  not  Y^l,k  1,fc2)  itself.  For  consistency,  we  will  also 
referring  to  the  unit  square  [0, 1]  x  [0, 1]  as  yl0’1’1). 

Suppose  now  that  the  function  K  :  y(0,1,1)  x  yt0-1’1)  -4  C  is  such  that 


[  (  [  \K(x,y)\2  dy]  dx  <  +  oo, 

(42) 

and 

[  (  f  \K(y,x)\2  dx)  dy  < +00, 

(43) 

for  all  l  >  1,  ki  = 

1, . . . ,  2f,  k2  —  For  any  square  Y^’kiM)^ 

we  will  define  the 

integral  operators 

p(i,fei,fc2)  .  £2(y(i,fci’*2))  -4  L2(x(i’ki,k2')), 

(44) 

p(lM,k2)  .  L2(x(l’ fcl’*2))  -4  L2(Y^l’k  1>fc2))j 

(45) 

by  the  formulae 

(■ P{lMM )  •  or)  (a?)  =  f  K{x,  y)cr(y)  dy, 

JyVMM) 

(46) 

(R{l'kl M)  ■  <r) (y)  =  f  ...  .K (y, ®)a(s) dx. 
JxV’k  i-fc2) 

(47) 

The  function  (p0’fei>fe2)  .  a)  G  L2(X^l,kl,k2^)  with  a  G  L2(Y^l,kl,k^)  will  be  referred  to 
as  the  potential  due  to  the  charge  distribution  a  on  the  square  Y^l,kl,k2\  Similarly,  the 
function  (B^l’kl,k^  ■  a)  €  L2(Y^l,kl'k^)  with  cr  €  L2(X^l’kl’k2^)  will  be  referred  to  as  the 
incoming  potential  due  to  some  charge  distribution  a  on  X^l,kl,k2\ 

Due  to  (42),  (43),  and  Theorem  2.9,  there  exist  functions 


r  in, (l,ki,k2) 
l  uk 

}  G  p2(y(I,fci,fc2)))  ^out,(l,fci,*:2)j  g  jr^2^y(i,fci,fc2)^ 

(48) 

{u^lt^l’kl,k^}  G  L\X^’k2\  {v™<(l>ki,k2) |  ^  I/2 (X^’*1’*2)), 

(49) 

and  positive  real  numbers 

r  in,(l,kuk2)x  ;  out,(l,ki,k2)x 

\sk  h  \sk  /i 

(50) 

such  that 

i(T(a:,y) 

00 

_  ^  u<^t,(l,ki,k2)  ,k2) vout,(l,k!,k2)  (yj 

k= 1 

(51) 

K(y,x) 

OO 

_  ^  u«i,(Z,fci,fc2)  ^sm,(l,kuk2) vin,(l,ki,k2)  ^ 

k=l 

(52) 

We  will  refer  to  (51),  (52)  as  the  outgoing  and  incoming  singular  value  decompositions  for 
the  square  Y^l’kl,k2\  respectively. 

We  will  be  using  finite-dimensional  approximations  to  the  operators  (44),  (45)  obtained 
by  truncating  expressions  (51),  (52)  after  a  finite  number  of  terms.  Specifically,  given  two 
natural  numbers  pi  and  rj,  we  will  define  the  operators 

p(L,ki,k2)  .  L2(Y(i’ki>k2))  j} (x(i,fcl,fc2)), 

pQMte)  .  x,2(x(1,fcl,fc2))'  ->  i2(y(2.fci>fc2)) 
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(53) 

(54) 


by  the  formulae 


r)(i)  =  lYgtMKp,{x,y)a(y)dy, 

(55) 

(Rti  kl M)  ■  a)  (y)  =  Kri  (y,  x)a{x)  dx, 

JX'1’*  1**2) 

(56) 

with 

KPl  (x,  y)  =  jr  u^{lM M)  (x)s^lM  ’k2\°kut«’k' ^  (y), 

k= 1 

(57) 

Kn(y,x)  =  i^)(a.). 

fc=i 

(58) 

Substituting  (57),  (58)  into  (55),  (56),  we  obtain 

Pi 

(pVMM)  .  f)(x)  =  y;  a°ut,(l,ki,k2) 

k=l 

(59) 

with  the  coefficients  a<^it^l’kl,k2^  given  by  the  formula 

(60) 

and 

[R^kiM)  .a)(y)  =  ^  u™’(l,kl’k2\y)s™^l’kl,k^ a^1*1’^, 
k=l 

(61) 

with  the  coefficients  a™’(l'kl,k2)  given  by  the  formula 

=  J  ^  ^  v^l’kl'k2\x)a{x )  dx. 

(62) 

The  function  ■  a)  e  L2(X^k^)  with  a  €  L2(Y^k **>))  will  be  referred  to 

as  the  outgoing  singular  function  expansion  due  to  the  charge  distribution  a  on  the  square 
Similarly,  the  function  ■  a)  €  L2(X^l’kl’k^)  with  a  €  L2(Y^l,k  1,fc2))  will 

be  referred  to  as  the  incoming  singular  function  expansion  due  to  some  charge  distribution 
<7  on  x(l,kl’k2\ 


3.2  Singular  Function  Expansions  of  the  Potentials 

The  following  theorem  provides  a  tool  for  approximating  potentials  produced  by  arbitrary 
charge  distributions. 

Theorem  3.1.  Suppose  that  the  outgoing  potential  g(l,k  1,k2>  6  L2(X^l,kl,k2^)  is  induced  by 
the  charge  distribution  <j(l’ki<k2)  ;  i.e. 

gV,k ite)(x)  =  (pUMte) .  a{i,k i.fcj))^)  =  J  ^  K(x,y)a^k2\y)  dy.  (63) 
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Then 


OO 

—  y  u°ut,(l,kiM)  aout,(l,kiM) 

k=l 

with  the  coefficients  o^ut’^’fcl’^2)  given  by  the  formula 

=  X, . 

Furthermore,  for  any  p>  1, 

fc=i  k  ; 

—  sp+l  ll17  jlz,2(y(i,*i,fc2))> 


and 


£|a*ut,(M:i ^)|2  < 

fe= 1 


£2(y(l,k1,k2)y 


(64) 


(65) 


(66) 

(67) 


Proo/.  (66)  follows  immediately  from  Theorem  2.10.  Singular  values  sy>(l’k^k2)  converge 
to  zero  as  A:  -*  oo;  therefore,  (66)  implies  (64).  Finally,  due  to  (65),  a^l,kl}k^  are  the 
coefficients  in  the  orthonormal  basis  {v^l’kl ’fcz) },  from  which  (67)  follows  immediately.  □ 


3.3  Translation  Operators  and  Error  Bounds 

The  following  three  theorems  constitute  the  principal  analytical  tool  for  manipulating  out¬ 
going  and  incoming  singular  function  expansions.  Theorems  3.2,  3.4  provide  formulae  for 
the  translation  of  outgoing  and  incoming  singular  function  expansions,  respectively.  Theo¬ 
rem  3.3  describes  a  mechanism  for  converting  an  outgoing  singular  function  expansion  into 
an  incoming  singular  function  expansion. 

Theorem  3.2  (Outgoing  to  Outgoing).  Suppose  that  the  outgoing  singular  function  ex¬ 
pansion  gout’(l>k i>*2)  :  L2(X(l’kl'k2'>)  -»  R  is  given  by  the  formula 

OO 

gOut,(l,kiM _  y  u<mt,(lMM)  ^syAlMM) 

k=l 

with  the  coefficients  a£lt’(l,kl,k2)  sucfi  that 


y  laout,m,k2)l2 


<  +00, 


and  that  Y^,k  1,k2^  c  1,Tn2). 

Then  there  exists  a  linear  mapping 

i2(JV) 


(69) 


(70) 
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converting  the  sequence  of  coefficients  ’**)},  k  =  1,2,...  into  the  sequence  { a™ 1>mi>m2)}j 

m  —  1,2,...,  defined  by  the  formulae 


00 

out, (2-1, 7711,7712)  _  V~^  A(l-l,mi,m2),(l,ki,k2)„out,(l,ki,k2) 
um  —  2^  “A:  > 

k=l 


■^mfcl  mi ’^2)  =  J  v™t^l’kl,k2\y)v™t’(l~l'mi,rn2\y)  dy, 

such  that  for  all  x  inside 


(71) 

(72) 


out,(lM, 


k2\x) 


Y  y^™tXl-l,Tni,m2) ^xjsout,(l-l,mi,rn.2) aout,(l-l,mi,m2) 


(73) 


Furthermore,  for  any  p>  1, 


m=l 


^  out, (l— 1, mi  ,777,2) 

-  sP+i 


00 


I  oui,(2,fci,fc2) 
I  ak 


(74) 


Proof.  We  observe  that  can  be  viewed  as  the  potential 

gOuiAlM ,k2) (x)  =  [  ...  .K (x, y)a ^ (y) dy  (75) 

induced  by  the  charge  distribution  aV-tiM)  .  defined  by  the  formula 

°° 

(y)  =  y(™WMto)vout,{lM,k2)^yy  (75) 

fc=l 


We  will  denote  by  cr^  1,mi,m2)  the  charge  distribution  on  the  square  y(i-1,?ril’m2)  given  by 
the  formula 


0.(2-1,7711  ,7712 


a(i,fcl,*2)(y)) 

0, 


if  y  e  y(z>*i’fc2), 

if  1/  €  y(i-l.mi,m2)\yy(2,fci,fc2)) 


(77) 


and  by  1,7711  ,T712^  the  outgoing  potential  on  X^_1,mi  ,T712)  due  to  the  distribution  >m2> 

on  the  square  y(,-1>mi>m2)J  i.e., 


^(Z-l,mi,ma)(a.j  _  ^p(l-l,mi,m2)  .  a(2-l, 7711,71*2)^3.)  _ 

=  L (78) 

Due  to  Theorem  3.1, 


^(2-1,77*1,7712)^.)  _  U»ut, (2-1, 7711, 7712)^^01(2,(2-1,7711,7712)^2, (2-1,7711,7772)  (yg) 


771=1 
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with  the  coefficients  <C(’^  defined  by  the  formula 

aout,(l-l,mum2)  =  jf  ^  ^  ^  a(i-l,m1,m2)(y)vmlt>a-l,ml)m2)(y)  dy  (80) 

Now,  using  (77),  we  have 

=  f  aVMM){y)vout,(l- dy  (81) 

Substituting  (76)  into  (81),  we  arrive  at 


00  /  r 

0mi,(W,mi,m2)  _  y~jQout,(;,fci,fc2)  v™t^l'kl,k2\y)v™t^l~l'mi'm2\y)drjJ  = 


_  V'  out,{l,k\,k2) 

/  ->  ak  A mk  i 

Jt=l 


where 

A^MUi-i^)  =  w«.t,(Z,fcx^)(y)|;««t,(«_i>TOl>TOa)(y)  dy 

Now,  from  the  combination  of  (78)  and  Theorem  2.10,  we  obtain 

E<t’(Z“l’m1’m2)^)Ct’(i_1’miIm2)0mt’(l'1,mi’m2)IL2(x(l-i.mi.fn2))< 

771=1 

<  OUt, (Z— 1,1711, m2)  1 1  (i  — 1,7711  ,7712)  1 1 
^  SP+1  IF  ;|lL2(y('-X.m1,m2)). 

Due  to  (77),  we  have 

11  Sy^^y^'^y)^- 


F 

E  otzt,(J-l,mi,m2)/  \  £mt,(i-l,mi,m2)  out,(Z-l,mi,m2)i| 

w*  V*iv'5m  um  II  L2(A'(,-1>mi» 


m=  1 

<  >TO2)| \a(l,ki  ,fc2) 


•m2)) 


Thus, 


iL2(y(,>*1'fc2))> 


(82) 

(83) 


(84) 


(85) 


k2\x)  -  £  ^>(i-1’mi’mjH®)Ct’('“1’mi,m2)Ct’(i_1’mi’m2)IL2(^(i-i,-i.-2)) 


||yout)(Z,*i,k2)(a.)  _ 

m=l 

o0irt,(Z-l,mi,m2) 

-  SP+1  ^ 

00 

£i< 

k= 1 

out,{lMM) 

h 


I2- 


(86) 


Finally,  the  singular  values  converge  to  zero  as  k  -+  00;  therefore,  (86)  implies 

(73),  and  from  the  combination  of  (76),  (77),  (80),  we  have 

p 


y)  ]acm*,(<-l,mi,m2)|2  <  ||0.(/-l,mi,m2)i| 
m=X 


2 

£2(y(,_1'TO1’m2 


>) 


00 

_  y  ,.o^t,(z,fci,fc2)|2 
jfc=i 


(87) 
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□ 

The  proof  of  the  following  two  theorems  is  virtually  identical  to  that  of  Theorem  3.2, 
and  is  omitted. 

Theorem  3.3  (Outgoing  to  Incoming).  Suppose  that  the  outgoing  singular  function  ex¬ 
pansion  gOvUWite)  ;  R  is  given  by  the  formuia 

00 

gOut,(l,kiM)^  _  uout,(lMM)(x}s<mt,(l,kito)aout,(l,kiM) 
k= 1 

with  the  real  coefficients  such  that 


OO 

y^ 

k= 1 


<  +oo, 


and  that  Y^m i.»*a)  C  X^l>kl’k2\ 

Then  there  exists  a  linear  mapping 

jg(J,mx,m2),(J,fci,fc2)  .  _4  /2(JV) 


(89) 


(90) 


converting  the  sequence  of  coefficients  {a™4’^’*1’*2)},  fc  =  1,2, . . .  into  the  sequence 
m  =  1, 2, . . .,  defined  by  the  formulae 


OO 

in, (l, mi, m2)  _  V  T>(l’mi,m2),{l,ki,k2)  out,{l,ki,k2) 
am  ~  Z_j  Dmk  ak  > 

fc=l 


(91) 


B{l,mi,m2Ul,ki,k2)  =  v^lM,k2){yyn,(l,mi,m2){y)  ^ 

such  that  for  all  x  inside  Y^,mi,m2\ 

00 

g0Ut,(l,kUk2)^  =  J-  uin,(l,mi  ,m2 )  (xjstn,(l,mi  ,m2)c[in,(t,m1  ,m2) 
m—1 


and 


00  00 

y  |ain,(i,mi,m2)|2  |aout,(i,fci,fc2)|2 

m=l  fc=l 


Furthermore,  for  any  p>  1, 


(92) 


(93) 


(94) 


p 

||<7owt,(i,*:i’*2)(a0  —  u^’^’Tni,m2^(a:)s^,^,mi’m2^ai”’^’mi’Tn2^|| 

m=l 


£,2(y((,m1,m2)) 


< 


00 

in, (l, mi, mi)  | ,fc2) J2 

-  SP+1  A  2s  \ak  I 

\fc=l 


(95) 
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Theorem  3.4  (Incoming  to  Incoming).  Suppose  that  the  incoming  singular  function 
expansion  gintt>ki,k2)  .  R  js  given  f,y  t^e  formuia 


OO 

gin,(l,kiM)(xj  —  y  v^n^M,k2)^sin,(lMM)ain,(l,ki,k2) 
k=  1 


(96) 


with  the  coefficients  af’^’kl’k2^  such  that 


£K 

k=l 


in,(I,fci,fc2)|2 


<  +oo, 


and  that  y0+l,mi,m2)  c  y(t,fc i,fc2)_ 

TVien  there  exists  a  linear  mapping 


(97) 


Cr(i+llmi,m2),(itfc1^2)  .  j2^  (gg) 

converting  the  sequence  of  coefficients  {a^*’*1’*2)},  k=  1,2,...  info  the  sequence  {a^’('+1’mi’m2)}, 
m  =  1,2,...,  defined  by  the  formulae 


in,[l,m\,m2)  _  /n,(/+l,mi,m2),(/,fci,fc2)  out,{l,ki,k2) 

“m  “2^  Umfc  aA  > 


ik=l 


where 


c(l+hwum2),{lMM)  =f  |;<n,(W1^)(y)t;ft?I(,+l,m1>m2)(y)  dy> 
J  X't,k  1»*2/ 

suc/i  that  for  all  y  inside  yO+i»«u,m2)^ 

00 

gin,(l,kuk2)^  =  y 

um,{l+l,mi,m2)  ^x^sinXl+l,mi,m2)  ain,(l+l,mi,m2) 

and 


m=l 


y  |am,(J+l,mi,m2)|2  <  ^  p 

”i=l  fc=l 

Furthermore,  for  any  p>  1, 


_  £  ^,(X+l,mllTO2)(x)sm>(i+1,ml!m2)a^,(/+1,m1,m2)||i2(^j+i  miim2))  < 
m=l 

(103) 


(99) 

(100) 

(101) 

(102) 


^  m,(l+l,mi,m2) 

-  sp+ 1 


N 


Ew 

/t=i 


in,(l,ki,k2)  |2 
A;  I  • 


3.4  Singular  Value  Decompositions  of  Translation  Operators 

The  algorithm  of  the  following  section  (like  its  counterpart  for  harmonic  fields  described, 
for  example,  in  [3])  depends  on  the  efficient  application  of  the  translation  operators  (70), 
(90),  (98)  to  arbitrary  vectors.  Clearly,  these  operators  convert  functions  on  the  square 
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into  functions  on  the  square,  and  could  be  extremely  expensive  to  deal  with  numerically. 
Fortunately,  Theorems  2.7,  2.8  of  Section  2  guarantee  that  (asymptotically  speaking)  the 
cost  of  applying  each  of  the  operators  (70),  (90),  (98)  to  an  arbitrary  vector  is  of  the  order 

c  +  d-  log(e)4,  (104) 

with  the  constants  c,  d  independent  of  the  operator  to  be  applied  (as  long  as  the  conditions 
of  Theorem  2.8  are  satisfied).  We  will  discuss  the  procedure  for  the  efficient  numerical 
evaluation  of  the  operator  (90)  in  some  detail;  the  operators  (70),  (98)  axe  in  this  respect 
identical  to  the  operator  (90). 

Let  us  consider  the  operator  (90)  with  some  m\,m2,k\,k2.  Choosing  some  natural  n,  we 
construct  annxn  tensor-product  Gaussian  discretization  of  each  of  the  squares  y0>m  i>m2), 
yO,fc i,*2),  anc[  expand  the  kernel  K  on  y0-m i>m2)  x  Y^l,kl,k 2>  into  a  4-dimensional  tensor 
product  Legendre  series.  Due  to  Theorem  2.8,  the  error  of  such  an  expansion  is  bounded 
by 

6(1  +  n)4  •  qn,  (105) 

where  6  is  a  positive  constant  and  |g|  <  1.  Choosing  n  =  c  +  d-  log(e),  we  guarantee  that 
the  error  of  our  expansion  is  less  than  any  axbitraxy  a-priori  prescribed  e.  An  examination 
of  (105)  shows  that  the  length  of  the  expansion  required  to  obtain  reasonable  accuracy 
is  not  excessive,  though  it  is  considerably  greater  than  the  lengths  expansions  required 
for  haxmonic  kernels  (see,  for  example,  [3]).  An  additional  improvement  in  the  required 
lengths  of  expansions  is  obtained  by  replacing  the  tensor-product  Legendre  expansions  of 
the  operators  (70),  (90),  (98)  with  their  Singular  Value  Decompositions  via  Theorems  2.9, 
2.10,  2.11.  The  cost  of  this  latter  step  (in  terms  of  CPU  time  requirements)  is  of  the  order 
p3,  and  would  be  excessive,  except  for  the  fact  that  this  procedure  has  to  be  performed  only 
once  for  each  kernel,  since  the  necessary  SVDs  can  be  precomputed  and  stored;  needless  to 
say,  this  requires  an  amount  of  storage  proportional  to  p  •  n2. 

Remark  3.5.  The  situation  is  simplified  when  the  kernel  K  is  convolutional,  i.e  depends 
only  on  the  difference  between  its  arguments.  Indeed,  in  this  case,  the  SDVs  of  the  trans¬ 
lation  operators  A^~^’mi'm2^’^,kl,k2\  B^,mi,m2^’^’kl,k 2\  1>mi>m2)>0’fcl>fc2)  not  have  to 

be  calculated  for  all  interacting  pairs  of  squares  on  all  levels,  but  only  for  all  interactions 
of  a  single  square  on  each  level.  In  this  case,  the  construction  of  the  SVDs  requires  trivial 
amounts  of  both  CPU  time  and  disk  space.  When  the  kernel  K  is  not  only  convolutional  but 
possesses  additional  symmetry  (rotational,  up-down,  etc.)  the  situation  is  further  simplified. 

4  Generalized  Fast  Multipole  Method  in  Two  Dimensions 

4.1  Notation 

In  this  section  we  will  introduce  the  notation  to  be  used  in  the  description  of  the  algorithm. 

For  any  subset  A  of  the  computational  box,  T(A)  will  denote  the  set  of  particles  inside 
A. 

Bi  is  the  set  of  all  nonempty  boxes  at  the  level  l.  Bq  will  denote  the  computational  box 
itself. 


14 


If  box  contains  more  than  s  particles,  it  is  called  a  parent  box.  Otherwise,  the  box  is 
said  to  be  childless.  Note  that  s  is  the  maximum  number  of  points  in  a  childless  box. 

A  child  box  is  nonempty  box  obtained  from  the  division  of  a  parent  box  into  four. 

Colleagues  are  adjacent  boxes  of  the  same  size  at  the  same  level.  A  given  box  has  at 
most  eight  colleagues. 

Two  boxes  b  and  c  are  said  to  be  well  separated  if  they  are  separated  a  distance  greater 
or  equal  to  the  length  of  the  size  of  the  smallest  box. 

With  each  box  b  at  the  level  l,  we  will  associate  five  lists  of  other  boxes. 

List  1  of  a  box  b  will  be  denoted  by  Ub ■  It  is  empty  if  6  is  a  parent  box.  If  6  is  childless, 
it  consists  of  6  and  of  all  childless  boxes  c  that  are  adjacent  to  b. 

List  2  of  a  box  b  will  be  denoted  by  V&.  It  consists  of  all  boxes  c  that  are  children  of  the 
colleagues  of  the  6’s  parent  and  that  are  well  separated  from  b. 

List  3  of  a  box  b  will  be  denoted  by  Wb.  It  is  empty  if  b  is  a  parent  box.  If  b  is  childless, 
it  consists  of  all  descendants  of  6’s  colleagues  whose  parent  are  adjacent  to  b  but  who  are 
not  adjacent  to  b  themselves.  Note  that  b  is  separated  from  each  box  c  in  Wb  by  a  distance 
greater  or  equal  to  the  length  of  the  size  of  c. 

List  4  of  a  box  b  will  be  denoted  by  Xb-  It  consists  of  all  boxes  c  such  that  b  G  Wc.  Note 
that  all  boxes  in  List  4  are  childless  and  larger  that  b. 

List  5  of  a  box  b  will  be  denoted  by  Yj.  It  consists  of  all  boxes  c  that  are  well  separated 
from  6’s  parent. 

$6  will  denote  the  p-term  outgoing  singular  function  expansion  for  the  box  6. 

Wj,  will  denote  the  p-term  incoming  singular  function  expansion  for  the  box  6. 

r6  will  denote  the  p-term  incoming  singular  function  expansion  for  the  box  6  due  to  all 
particles  in  T(Vb). 

A b  will  denote  the  p-term  incoming  singular  function  expansion  for  the  box  6  due  to  all 
charges  in  T{Xb). 

$t(r)  is  the  result  of  evaluation  of  the  expansion  'I'j,  at  a  particle  r  G  T(6). 

a;,(r)  will  denote  the  potential  at  r  G  T(6)  due  to  all  particles  in  T(Ub). 

t %(r )  will  denote  the  potential  at  r  G  T(6)  due  to  all  particles  in  T(Wj,). 

7&(r)  will  denote  the  potential  at  r  G  T(6)  due  to  all  particles  in  T(Yb). 

F(r)  will  denote  the  potential  at  r. 

AbtC  will  denote  the  translation  operator  (a  p  x  p  matrix)  in  the  Theorem  3.2  for  the 
boxes  6  and  c  such  that  6  =  i>m2)  and  c  = 

Bb,c  will  denote  the  translation  operator  (a  pxp  matrix)  in  the  Theorem  3.3  for  the 
boxes  6  and  c  such  that  6  =  y(I>mi>m 2)  and  c  =  y(*>fc 

Cb,c  will  denote  the  translation  operator  (a  pxp  matrix)  in  the  Theorem  3.4  for  the 
boxes  6  and  c  such  that  6  =  y(z+1>mi,m2)  and  c  =  y(J«*i>*2). 

4.2  Informal  Description  of  the  Algorithm 

1.  Create  the  adaptive  quad- tree.  Compute  the  outgoing  and  incoming  singular  functions 
for  each  box  in  the  computational  tree,  by  the  means  of  the  Theorem  2.11. 

2.  For  each  childless  box  6,  the  interactions  between  particles  in  T(6)  and  T{JJb)  are 
evaluated  directly.  For  each  particle  r  G  T(6)  the  result  is  a&(r). 
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3.  For  each  childless  box  b,  form  an  outgoing  singular  function  expansion  4>&  by  the 
means  of  Theorem  3.1.  For  each  parent  box  b,  use  Theorem  3.2  to  translate  and 
merge  the  outgoing  singular  function  expansions  of  its  children  into  the  outgoing 
singular  function  expansion 

4.  Use  Theorem  3.3  to  convert  the  outgoing  singular  expansion  of  each  box  in  Vb  into  the 
incoming  singular  function  expansion  in  the  box  6,  adding  the  resulting  expansions 
together  to  obtain  T*,. 

5.  Convert  the  potential  of  all  particles  in  T(Xb )  into  a  incoming  singular  function  ex¬ 
pansion  in  the  box  b,  adding  the  resulting  expansions  to  obtain  A*,.  Add  A b  to  r^. 

6.  For  each  childless  box  b,  evaluate  the  potential  /3j(r)  due  to  all  particles  in  T(W&)  by 
evaluating  the  outgoing  singular  function  expansions  $c  for  each  box  c  €  W&. 

7.  Translate  the  incoming  singular  function  expansion  Tb  of  6’s  parent  B  to  the  box  b 
by  the  means  of  Theorem  3.4.  Add  the  resulting  local  expansion  to  T;,. 

8.  For  each  childless  box  b,  evaluate  the  local  expansion  T&  at  every  particle  r  £  b  and 
add  the  result  to  aj(r)  and  /3&(r),  obtaining  the  potential  F(r)  at  r. 

4.3  Detailed  Description  of  the  Algorithm 

Step  1:  Initialization 

Comment  [  Set  the  order  n  of  Legendre  expansions,  the  number  of  terms  p  in  all  singular 
function  expansions,  and  the  maximum  number  s  of  the  particles  in  a  childless  box.  Create 
the  computational  tree.  ] 

do  l  =  0, 1,2,. . . 
do  b  €  Bt 

if  b  contains  more  than  s  particles  then 
subdivide  b  into  four  smaller  boxes, 
ignore  empty  boxes,  add  nonempty  boxes  to  Bi+ 1- 
endif 
enddo 
enddo 


Comment  [  For  each  box  b  in  the  computational  tree,  compute  the  outgoing  and  incoming 
singular  value  decompositions  of  the  kernel  K.  ] 

do  l  =  0, 1, 2, . . . 
do  6  €  Bi 

Set  b  =  Y('’fcl’*2).  Compute  two  singular  value  decompositions  for  x  € 
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K(x,y)  =  £«*) .  •  «#(»), 

fc=l 

00 

*(»,*)  =  E  «&(»)•'&  •»&(*)• 

ifc=l 

enddo 

enddo 


Step  2:  Local  Interactions 

Comment  [  For  each  childless  box  b,  evaluate  interactions  with  the  particles  in  T(Ub) 
directly,  obtaining  the  potential  due  to  nearby  particles.  ] 

do/ =  0,1,2,... 

do  b  G  Bi,  b  is  childless 

do  Xi  G  T(b),Xj  G  T(Ub) 


enddo 

enddo 

enddo 


a>b(xi)  -  ab(xi )  +  K(xi,Xj). 

i 


Cost  [  9(N/s)  ■  s  ■  s  +  8 (N/s)  ■  s  ■  s  operations.  ] 


Step  3:  Outgoing  Singular  Function  Expansions 


Comment  [  For  each  childless  box  b,  form  the  outgoing  singular  function  expansion  $6.  ] 

do  /  =  0,1,2,... 

do  6  G  Bi,  b  is  childless 

Evaluate  the  coefficients  of  the  outgoing  singular  function  expansion  for  the 
square  b  by  the  means  of  the  Theorem  3.1., 


for  all  k  =  1, . . .  ,p. 
enddo 
enddo 


®b;k  =  2  •  V%k(Xj), 

Xj€b 
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Cost  [  Np  operations.  ] 


Step  4:  Upward  Sweep 

Comment  [  For  each  parent  box  b,  form  the  outgoing  singular  function  expansion  by 
translating  the  outgoing  singular  function  expansions  of  b’s  children  and  adding  the  resulting 
expansions  together.  ] 

do  /  =  ...,  2, 1, 0 

do  6  €  B[,  6  is  a  parent  box 

Use  Theorem  3.2  to  translate  and  merge  the  outgoing  singular  function  ex¬ 
pansions  of  b's  children  b\,  62,  63,  64  into  the  outgoing  singular  function 
expansion  $5 


enddo 

enddo 


$6  —  $6  +  Abfa  ■  $&!  +  Abfo  •  $i>2  +  ^4.6,63  '  ^63  +  ^6,64  '  ^64 


Cost  [  ( 4/3)(N/s )  ■  p2  operations.  ] 


Step  5:  Adaptive  Part 


Comment  [  For  each  childless  box  b ,  form  the  incoming  singular  function  expansion  A 5 
due  to  particles  located  in  List  4  of  b.  ] 

do  l  =  0,1,2, ... 

do  6  €  Bi,  b  is  childless 

Use  Theorem  3.1  to  evaluate  the  coefficients  of  the  incoming  singular  function 
expansion  A  &  for  the  square  b 


for  all  k  =  1 . . .  ,p. 
enddo 
enddo 


Ab-,k  =  Y.  •<*(*»), 


Cost  [  8 {N/s)  -  p-  s  operations.  ] 

Comment  [  For  each  box  6,  evaluate  the  outgoing  singular  function  expansion  at  each 
particle  located  in  boxes  c  in  List  4  of  b.  ] 

do  /  =  0, 1, 2, . . . 

do  b  €  Bi,  b  is  childless 
do  Xi  €  Xb 
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enddo 

enddo 

enddo 


(3b(Xi)  =  (3b(xi)  +  •  s$  •  v$(xi). 


k=i 


Cost  [  8 (N/s)  -p  -  s  operations.  ] 


Step  6:  Outgoing  to  Incoming 

Comment  [  For  each  box  b,  convert  the  outgoing  singular  function  expansion  $c  for  each 
box  c  in  List  2  of  b,  into  the  incoming  singular  function  expansion  Tj,  adding  the  resulting 
expansions  together.  ] 

do  l  =  0,1,2, ... 
do  6  €  Bi 

For  all  boxes  c  E  Vb,  convert  the  outgoing  singular  function  expansion  into 
the  incoming  singular  function  expansion  for  the  box  b  by  the  means  of 
Theorem  3.3.  Add  the  resulting  singular  function  expansions  to  Fj, 


r&  =  Tb  +  32  BbtC  •  $c- 
cev6 

Add  Tfe  and  Ab  to  obtain  the  incoming  singular  function  expansion  tyb 


enddo 

enddo 


%  =  r6  +  Ab. 


Cost  [  27-  (4/3)(iV/s)  ■  p2  operations.  ] 


Step  7:  Downward  Sweep 

Comment  [  For  every  parent  box  b,  translate  the  incoming  singular  function  expansion  tyb 
to  6’s  children  incoming  singular  function  expansions.  ] 

do  l  =  0,1,2, ... 

do  b  €  Bi,  b  is  a  parent  box 
do  c  €  Bi+ 1,  c  is  a  b's  child 

Translate  the  incoming  singular  function  expansion  vp*,  by  the  means  of  The¬ 
orem  3.4.  Add  the  resulting  local  expansion  to 

+  Cc,6  •  ^b- 
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enddo 

enddo 

enddo 


Cost  [  (4/3 ){N/s)  ■  p2  operations.  ] 


Step  8 

Comment  [  For  every  clxildless  box  b,  evaluate  incoming  singular  function  expansions  'J'j 
at  each  particle,  obtaining  the  potential  due  to  distant  particles.  Find  the  potential  at  r  €  b 
by  adding  ab{r),  /3b{r),  7 b(r)  together.  ] 

do  l  =  0, 1,2, ... 

do  b  €  Bi,  b  is  childless 

For  each  particle  Xj  €  6,  evaluate 

7 b(xj)  =  Y  ^b-,k  ■  4*  ■ 

k= 1 

Add  ab(xj),  /3b(xj),  7b(xj)  to  obtain  the  potential  F(xj)  at  Xj  6  b 
F{xj)  =  ab(xj)  +  pb{xj)  +  7 b(xj). 


enddo 

enddo 


Cost  [  N  ■  p  operations.  ] 

4.4  Complexity  of  the  Algorithm 

Since  s  is  the  average  number  of  particles  in  a  childless  box  at  the  finest  level,  there  are 
approximately  N/s  childless  boxes,  and  approximately 

B  =  (1  +  1/4  +  1/42  +  ...)  •  {N/s)  =  |  •  —  (106) 

3  s 

boxes  in  the  tree  hierarchy.  Therefore,  Step  3  requires  Np  work,  Step  4  requires  Bp2  work, 
Step  6  requires  27 Bp2  work,  Step  7  requires  Bp 2  work,  Step  8  requires  Np  work,  and  Step 
2  requires  9  •  N/s  ■  s  •  s  =  9 Ns  work.  Thus,  a  reasonable  estimate  for  the  total  operation 
count  is 

9 Ns  +  2 Np  +  29 Bp2  =  9Ns  +  2Np  +  29  •  \  ■  —  ■  p2.  (107) 

o  S 

With  s  =  2 p,  the  operation  count  becomes  approximately 

40  Np.  (108) 


20 


The  adaptive  part  of  the  algorithm  in  the  Step  5  requires  0(8(N/s)ps  +  8 (N/s)ps)  = 
0(16iVp)  work,  and  Step  3  requires  additional  0(8(N/s)s2)  =  0(8Ns)  work.  The  total 
operation  count  is 

l7Ns  +  18  Np  +  29  Bp2  =  17Ns  +  18JVp  +  29(4/3)  (iV/s)p2 .  (109) 

By  setting  s  =  1.5p,  the  operation  count  becomes  approximately 

69  Np.  (110) 


5  Numerical  Results 


A  FORTRAN  program  has  been  written  implementing  the  algorithm  described  in  the 
preceding  section.  All  timings  listed  below  correspond  to  calculations  performed  on  an 
UltraSparc-I/167  computer  with  128MB  RAM,  using  double  precision  arithmetic.  The  or¬ 
der  of  Legendre  expansions  was  n  =  4,  n  =  8,  and  n  =  16  and  the  number  of  singular 
functions  varied  from  p  =  9  to  p  =  36  to  p  =  90  in  order  to  achieve  roughly  3,  6  and  10 
digits  accuracy,  respectively. 

The  results  of  these  experiments  are  presented  in  the  tables  below.  The  first  column  con¬ 
tains  the  number  of  particles  used  in  the  simulation.  The  second  column  contains  the  time 
for  construction  of  the  computational  tree  and  precomputation  of  values  singular  functions 
at  locations  of  particles.  This  can  be  done  once  for  any  given  configuration  of  particles. 
We  do  not  include  the  time  for  precomputation  of  singular  value  decompositions  in  this 
column,  since  this  can  be  done  in  advance  for  any  given  kernel.  The  third  column  contains 
the  total  run  time  of  the  algorithm.  The  fourth  and  the  fifth  columns  contain  the  actual 
time  required  by  the  algorithm  and  the  time  required  by  the  direct  algorithm,  respectively. 

Finally,  the  last  two  columns  contain  the  relative  2-norm  E2  and  the  relative  maximum 
error  Eoo  obtained  at  any  one  particle.  They  are  defined  by  the  formulae 


E2  = 


V  ) 


& 


=  max 
i 


I M  ’ 


(in) 


where  /,-  is  the  value  of  the  potential  at  the  i- th  particle  position  obtained  by  the  direct 
calculation,  and  /»  is  the  result  obtained  by  the  algorithm. 

For  the  first  set  of  tests,  the  positions  of  particles  were  uniformly  distributed  in  the  unit 
square.  For  the  second  set  of  tests,  two  fifth  of  charged  particles  were  distributed  uniformly 
along  two  ellipses  and  the  remaining  of  particles  were  distributed  randomly  in  three  circles 
with  a  gaussian  density.  The  number  of  terms  in  the  singular  function  expansions  was  set 
to  9,  36  and  90,  and  the  number  of  particles  in  a  childless  box  was  set  to  15,  61,  and  153, 
respectively. 

Several  observations  can  be  made  from  Tables  1-12  below,  and  from  the  more  extensive 
numerical  experiments  performed  by  the  authors. 

1.  The  number  of  singular  functions  required  to  obtain  3-digit  accuracy  is  9;  the  cor¬ 
responding  order  of  the  Legendre  expansions  is  4.  The  6-digit  scheme  requires  36-term 
singular-function  expansions,  and  Legendre  expansions  of  order  8.  In  order  to  obtain  10 
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digits,  we  used  90-term  singular  function  expansions,  and  obtained  these  (during  the  pre- 
computation  stage)  by  starting  with  Legendre  expansions  of  order  16. 

2.  For  the  3-digit  version  of  the  scheme,  the  break-even  point  with  the  direct  scheme  is 
n  ~  200;  for  6  digits,  the  break-even  point  is  n  ~  800,  and  for  10-digits  the  scheme  becomes 
faster  than  the  direct  one  at  n  ~  3000. 

3.  The  efficiency  of  the  algorithm  does  not  suffer  significantly  when  the  charges  in  the 
simulation  are  clustered.  On  the  other  hand,  unlike  its  counterpart  for  harmonic  kernels, 
the  algorithm  of  this  paper  does  not  seem  to  derive  any  advantage  from  the  clustering  of 
particles  in  the  simulation. 

4.  The  cost  of  the  algorithm  grows  rapidly  with  the  increase  of  accuracy  requirements. 
The  algorithm  is  considerably  slower  than  modern  versions  of  the  FMM  for  harmonic  fields, 
especially  in  high-accuracy  environments  (see,  for  example,  [10]). 

5.1  Generalizations  and  Conclusions 

The  algorithm  of  this  paper  has  an  obvious  analogue  in  three  dimensions:  quad-trees  are 
replaced  with  oct-trees,  two-dimensional  expansions  are  replaced  with  three-dimensional 
ones,  and  the  programming  becomes  more  involved.  Such  a  scheme  has  been  implemented 
(see  [6]),  and  found  to  work  satisfactorily,  as  long  as  the  required  precision  is  low.  For  accu¬ 
racies  better  than  three  or  four  digits,  the  CPU  time  requirements  of  the  three-dimensional 
scheme  become  excessive. 

For  many  kernels,  the  algorithm  of  this  paper  can  be  accelerated  via  an  approach  similar 
to  the  one  used  by  [4],  [9],  [10]  to  accelerate  the  FMM  for  harmonic  fields  in  two  and 
three  dimensions.  Specifically,  most  the  operators  (70),  (90),  (98)  can  be  diagonalized; 
this  requires  that  the  kernel  K  be  approximated  by  linear  combinations  of  exponentials  on 
appropriately  chosen  parts  of  the  product  Y^l,kl,k^  x  X^l,kl,k2\  Needless  to  say,  this  can 
not  be  done  for  a  “general”  kernel  K;  however,  it  appears  to  be  possible  for  many  kernels 
(and  classes  of  kernels)  of  interest.  Such  a  scheme  would  require  several  developments  (both 
analytic  and  numerical);  it  would  accelerate  the  two-dimensional  version  of  the  algorithm 
significantly.  The  real  pay-off  of  such  a  project  would  be  in  three  dimensions,  where  it 
would  be  likely  to  make  large-scale  high-precision  simulations  feasible. 
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Figure  1:  The  computational  box  and  three  levels  of  refinement. 
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Figure  2:  Non-uniform  distribution  of  charges  and  its  associated  adaptive  quad-tree. 
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Figure  3:  Box  b  and  its  associated  Lists  1  to  4  for  the  charge  distribution  in  Figure  2. 
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IV 

Tjnit(s) 

Tale  (s) 

Trun(s) 

Tdir  (s) 

E2 

Eoo 

200 

0.007 

0.009 

0.015 

0.019 

0.11770E-03 

0.85266E-03 

400 

0.015 

0.018 

0.034 

0.076 

0.27390E-03 

0.19749E-02 

800 

0.024 

0.047 

0.071 

0.310 

0.29473E-03 

0.20307E-02 

1600 

0.062 

0.089 

0.151 

1.344 

0.39506E-03 

0.36146E-02 

3200 

0.105 

0.213 

0.318 

5.371 

0.42503E-03 

0.38485E-02 

6400 

0.266 

0.399 

0.666 

21.783 

0.49194E-03 

0.43736E-02 

Table  1:  Uniformly  distributed  particles.  K(x,y)  —  l/\x-  y\,  s  =  15,  p  -  9,  and  n  —  4. 


N 

Tinit(s) 

Talg  (s) 

Trun(s) 

Trfir(s) 

E 2 

Eoo 

400 

0.066 

0.042 

0.107 

0.075 

0.37968E-07 

0.36455E-06 

800 

0.124 

0.130 

0.254 

0.309 

0.30664E-07 

0.23301E-06 

1600 

0.255 

0.251 

0.505 

1.347 

0.59016E-07 

0.63131E-06 

3200 

0.492 

0.684 

1.176 

5.375 

0.67426E-07 

0.67145E-06 

6400 

0.997 

1.230 

2.227 

21.756 

0.16065E-06 

0.16568E-05 

Table  2:  Uniformly  distributed  particles.  K(x,  y)  =  l/\x  -  y\,  s  =  61,  p  =  36,  and  n  =  8. 


N 

Tinit(s) 

Talg(s) 

Trun(s) 

Tdir{s) 

E2 

Eoo 

800 

0.832 

0.213 

1.045 

0.316 

0.35519E-11 

0.27597E-10 

1600 

1.625 

0.580 

2.205 

1.342 

0.27911E-11 

0.23206E-10 

3200 

3.210 

1.374 

4.515 

5.371 

0.47909E-11 

0.35374E-10 

6400 

6.301 

3.138 

9.438 

21.798 

0.40687E-11 

0.47116E-10 

Table  3:  Uniformly  distributed  particles.  K(x,y)  =  l/\x  -  y\,  s  =  153,  p  =  90,  and  n  =  16. 


N 

Tj'mt(s) 

Talg  (s) 

Trun(.s) 

Tdir{s) 

E2 

Eoo 

200 

0.007 

0.007 

0.014 

0.014 

0.33680E-06 

0.11237E-02 

400 

0.015 

0.016 

0.031 

0.055 

0.24487E-05 

0.46567E-02 

800 

0.024 

0.037 

0.061 

0.227 

0.75789E-05 

0.67792E-02 

1600 

0.063 

0.077 

0.140 

1.016 

0.36380E-04 

0.82441E-02 

3200 

0.105 

0.173 

0.278 

4.064 

0.10114E-03 

0.11347E-01 

6400 

0.267 

0.353 

0.619 

16.397 

0.42311E-04 

0.12510E-01 

Table  4:  Uniformly  distributed  particles.  K(x,y )  =  1/fx  -  y\2,  s  =  15,  p  =  9,  and  n  =  4. 
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Figure  4:  Uniformly  distributed  particles  and  the  associated  partition  of  the  computational 
box. 


N 

Tjnit  (s) 

Talg(s) 

Trun{s) 

T dir  {s') 

E2 

£oo 

400 

0.065 

0.033 

0.099 

0.055 

0.33610E-09 

0.96336E-06 

800 

0.126 

0.098 

0.223 

0.225 

0.74619E-09 

0.55977E-06 

1600 

0.254 

0.210 

0.465 

1.016 

0.59034E-08 

0.21584E-05 

3200 

0.493 

0.529 

1.022 

4.090 

0.18124E-07 

0.17612E-05 

6400 

0.996 

1.036 

2.031 

16.365 

0.14692E-07 

0.47616E-05 

Table  5:  Uniformly  distributed  particles.  K(x,y)  =  l/\x  —  y|2,  s  =  61,  p  =  36,  and  n  =  8. 
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N 

Tinit  (fi) 

Talq{s) 

Trun{s) 

Tdir  (5) 

Ei 

Eoo 

800 

0.826 

0.180 

1.006 

0.231 

0.14987E-12 

0.17505E-09 

1600 

1.597 

0.450 

2.047 

1.009 

0.32363E-12 

0.74589E-10 

3200 

3.205 

1.217 

4.422 

4.104 

0.20036E-11 

0.25330E-09 

6400 

6.315 

2.507 

8.823 

16.404 

0.46900E-12 

0.16662E-09 

Table  6:  Uniformly  distributed  particles.  K(x,  y)  =  l/\x  —  y\2,  s  =  153,  p  =  90,  and  n  =  16. 


Figure  5:  Highly  non-uniformly  distributed  particles  and  the  associated  partition  of  the 
computational  box. 
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N 

Tinit(s) 

Talg(s) 

Trun(s) 

Tdir(s) 

e2 

Eoo 

200 

0.008 

0.010 

0.019 

0.019 

0.13524E-03 

0.87697E-03 

400 

0.016 

0.029 

0.045 

0.076 

0.20754E-03 

0.11468E-02 

800 

0.029 

0.058 

0.087 

0.309 

0.26133E-03 

0.12042E-02 

1600 

0.057 

0.126 

0.183 

1.344 

0.32551E-03 

0.26410E-02 

3200 

0.114 

0.245 

0.358 

5.368 

0.37247E-03 

0.34192E-02 

6400 

0.224 

0.475 

0.699 

21.788 

0.42360E-03 

0.35911E-02 

Table  7:  Highly  non-uniformly  distributed  particles.  K(x,y)  =  l/\x  —  y\,  s  =  15,  p  =  9, 
and  n  =  4. 


N 

Tjmt(s) 

Talg(s) 

Trun(s) 

Tdir  (s) 

e2 

Eoo 

400 

0.065 

0.060 

0.125 

0.076 

0.59124E-07 

0.61426E-06 

800 

0.140 

0.139 

0.279 

0.315 

0.77114E-07 

0.11068E-05 

1600 

0.264 

0.413 

0.677 

1.336 

0.10049E-06 

0.97051E-06 

3200 

0.528 

0.834 

1.363 

5.439 

0.12151E-06 

0.12184E-05 

6400 

1.052 

1.867 

2.919 

21.761 

0.15353E-06 

0.15668E-05 

Table  8:  Highly  non-uniformly  distributed  particles.  K(x,y)  =  l/\x  —  y\,  s  =  61,  p  =  36, 
and  n  =  8. 


N 

Tinit{$) 

TQjff(s) 

TrUn{s) 

Tdir  is) 

£2 

Eoo 

800 

0.805 

0.250 

1.055 

0.314 

0.40445E-11 

0.87339E-10 

1600 

1.716 

0.603 

2.319 

1.338 

0.61795E-11 

0.75092E-10 

3200 

3.334 

1.769 

5.103 

5.442 

0.88132E-11 

0.85507E-10 

6400 

6.540 

5.366 

11.906 

21.810 

0.11716E-10 

0.12124E-09 

Table  9:  Highly  non-uniformly  distributed  particles.  K(x,y)  =  l/|x  —  y\,  s  =  153,  p  =  90, 
and  n  =  16. 
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N 

Tinit($) 

Talg{s) 

Trun(s) 

Tiiir{s ) 

#2 

Eoo 

400 

0.016 

0.023 

0.039 

0.055 

0.44531E-04 

0.19765E-02 

800 

0.029 

0.045 

0.075 

0.226 

0.72969E-04 

0.37896E-02 

1600 

0.058 

0.100 

0.158 

1.016 

0.98016E-04 

0.70910E-02 

3200 

0.115 

0.197 

0.312 

4.064 

0.24054E-03 

0.57700E-02 

6400 

0.225 

0.382 

0.608 

16.405 

0.23213E-03 

0.82506E-02 

Table  10:  Highly  non-uniformly  distributed  particles.  K(x,y)  =  l/\x  —  y\2,  s  =  15,  p  =  9, 
and  n  =  4. 


N 

Tinit{s) 

Talg(s) 

Trun{s) 

T<lir{s) 

E2 

Eqo 

400 

0.065 

0.045 

0.110 

0.054 

0.61825E-08 

0.15019E-05 

800 

0.140 

0.108 

0.247 

0.234 

0.10608E-07 

0.20936E-05 

1600 

0.265 

0.312 

0.577 

1.016 

0.13661E-07 

0.18906E-05 

3200 

0.521 

0.639 

1.160 

4.059 

0.38933E-07 

0.21694E-05 

6400 

1.043 

1.439 

2.481 

16.408 

0.38956E-07 

0.61407E-05 

Table  11:  Highly  non-uniformly  distributed  particles.  K(x,y)  —  l/|rc  —  y|2,  s  =  61,  p  =  36, 
and  n  =  8. 


N 

Tinit{s) 

Talq{s) 

Trun(s) 

Tdir{s) 

£2 

Eqo 

800 

0.805 

0.192 

0.996 

0.230 

0.10539E-11 

0.41111E-09 

1600 

1.717 

0.477 

2.194 

1.010 

0.68055E-12 

0.18332E-09 

3200 

3.338 

1.352 

4.691 

4.144 

0.28719E-11 

0.39139E-09 

6400 

6.540 

4.045 

10.586 

16.411 

0.29936E-11 

0.21587E-09 

Table  12:  Highly  non-uniformly  distributed  particles.  K(x,y)  =  l/|ar— y|2,  s  =  153,  p  =  90, 
and  n  =  16. 
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